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Abstract 

Wc  consider  the  problem  of  construcnng  a  dense  static  hash-based  lookup  table  T  for  a  set  S 
of  n  elements  belonging  to  a  universe  U={0,l,2,...,m  —  l}.  We  provide  nearly  tight 
bounds  on  the  spatial  complexity  of  oblivious  (9(l)-probe  hash  functions,  which  are  defined 
to  depend  solely  on  their  search  key  argument.  This  establishes  a  significant  gap  between  ob- 
livious and  non-oblivious  search.   In  particular,  our  results  include  the  following: 

•  Oblivious  k-pTobe  hash  functions  require  D,(-r2e  —k  +  loglog  m)  bits. 

•  A  probabilistic  construction  of  a  family  of  oblivious  /:-probe  hash  functions  that  can  be 
specified  in  0(ne  —k  +  loglog  m)  bits,  which  nearly  matches  the  above  lower  bound. 

•  A  variation  of  an  0(1)  time  1-probe  (perfect)  hash  function  that  can  be  specified  in 
0{n   +  loglog  m)  bits,  which  is  tight  to  within  a  constant  factor  of  the  lower  bound. 

•  Upper  and  lower  bounds  for  related  families  of  hash  functions. 
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1.   Introduction 

Hcishing  is  one  of  the  most  important  and  commonly  used  methods  to  organize  simple  collections  of  infor- 
mation. The  applications  are  extensive,  and  the  subject  has  a  correspondingly  rich  theoretical  literature.  In 
this  paper,  we  will  be  restricting  our  attention  principally  to  the  dictionary  problem,  which  concerns  how  to 
organize  a  set  S  of  distinct  keys  within  a  table  T  so  that  the  elements  can  be  retrieved  quickly.  We  shall  take 
the  data  set  to  be  static,  so  that  the  hash  table  need  not  support  insertion  or  deletion,  and  consider  only 
open  addressing,  so  that  pointers  will  not  be  used.  Most  of  our  results  will  focus  on  100%  utilized  tables. 

The  history  of  this  problem,  which  we  detail  in  the  next  subsection,  would  seem  to  suggest  that  virtually 
all  questions  have  been  answered  for  this  specific  problem;  a  variety  of  lower  bounds  have  been  established 
[YaoSl]  and  [Me84],  and  elegant  constructions  have  been  discovered,  which  nearly  meet  the  lower  bounds 
[FKS84]. 

Recently,  however,  new  techniques  for  organizing  data  have  been  devised  [FMNSSS88],  [FNSS88],  which 
show  that  an  enormous  amount  of  information  can  be  encoded  within  a  search  table.  The  thrust  of  these 
results  is  to  show  how  to  exploit  non-oblivious  search,  which  can  use  an  adaptive  probe  strategy  based  upon 
information  gleaned  from  unsuccessful  probes.  The  consequences  are  performance  bounds  and  extensions  for 
the  dictionary  problem  that  had  been  believed  to  be  impossible,  for  C)(l)-probe  hashing  schemes  [FNSS88]. 

The  unexpected  opportunities  demonstrated  by  these  results  have  lead  us  to  examine  afresh  the  com- 
putational models  and  assumptions  underlying  the  lower  bound  of  [Me84]  and  the  construction  of  [FKS84]. 
These  results  are  for  1-probe  schemes,  which  by  definition  cannot  be  adaptive.  The  natural  question  to  ask 
is  whether  the  opportunity  to  use  the  information  encoded  within  the  constant  number  of  probed  locations 
is  genuinely  significant,  or  if  the  power  of  the  [FNSS88]  scheme  is  actually  due  to  the  ability  to  query  several 
locations  for  the  search  key. 

A  few  preliminary  definitions  help  formalize  the  problem.  We  let  the  data  set  S  comprise  n  elements 
belonging  to  the  universe  U  =  {0,l,2,...,m  —  1}.    Our  hash-based  lookup  table  T  is  also  of  size  n.    A 

sequence  H  =  {hi,h'j In-)  of  functions  is  a  i--probe  hash  function  for  S,  if  H  :  U  i—  [I,"]*-  and 

T[l..i)]  can  be  organized  so  that  each  item  s  €  S  is  located  in  one  of  the  k  probe  positions  defined  by 
applying  the  k  probe  functions  to  s. 

The  method  of  probing  must  be  defined  quite  precisely.  A  hashing  strategy  is  oblivious  if  the  search 
locations  computed  by  H ,  are  based  solely  on  the  key  s,  and  not  on  other  keys  stored  in  T.  In  this  case,  the 
search  strategy  is 

for  ;  ^  1,2 k  do 

if  r[/i,(s)]  =  s  then  return(/i,(5)) 

endfor; 

if  s  has  not  been  found  then  return(FAIL). 

hi  contrast,  a  non-oblivious  hashing  scheme  can  make  computational  use  of  the  keys  encountered 
during  unsuccessful  search,  which  offers  a  wider  and  conceivably  more  powerful  range  of  search  strategies. 
Formally,  such  a  hashing  scheme  differs  in  that  /t,  is  an  ?-ary  function  mapping  [/'  into  [l,n].  The  non- 
oblivious  search  strategy  is 

for  !  ^  l,2,...,Jc  do 

Si  *-  T[hi{s,si,s-j,.  ..,Si_i)]; 


if  s,  =  s  then  return(/i,(s,  si ,  St,  .  .  . ,  s,-i)) 

endfor; 

if  s  has  not  been  found  then  return(FAIL). 

A  family  ^  is  a  A'-probe  family  for  U  if  every  n-element  subset  S  C  U  has  some  i:-probe  hash  function 
in  ^.  A  1-probe  hash  function  for  5,  is  called  a  perfect  hash  function,  and  a  perfect  family  for  U  denotes 
a  corresponding  1-probe  family.  (In  the  exposition  that  follows,  we  shall  frequently  take  the  liberty  of 
suppressing  the  implicitly  understood  parameters  n,  m  and  even  k,  for  notational  convenience.)  The  principal 
problem  we  analyze  is  how  many  hash  functions  are  needed  to  define  a  A:-probe  family  H ,  in  the  oblivious 
hashing  model.  In  particular,  we  attain  estimates  for  \og(H).  which  is  the  number  of  bits  needed  to  specify 
a  particular  hash  function  H  E  H. 
1.1  Background 

Mehlhorn  showed  that  the  bit  length  of  a  perfect  hash  function  from  5  into  a  table  T  of  size  ni,  where 
n  <  111  <  (I  -  e)m,  is  lower  bounded  by  Q(n-/ni  +  loglog77i)  [Me82],  [Me84].  The  explicit  construction 
of  efficient  perfect  hash  functions  has  been  explored,  among  others,  by  [Me84],  [Ma83],  [FKS84],  [JvE86] 
and  [SvE84].  In  [FKS84],  an  0(l)-time  computable  perfect  hashing  scheme  for  full  tables  (i.e.  nj  =  n)  is 
presented,  which  requires  a  description  of  0(loglog?n  +  ■n^/\ogn)  bits.  [JvE86]  reduce  the  upper  bound  for 
0(1)  time  1-probe  (FKS-like)  hashing  to  0(\oglogm  +  n  loglog»i)-bits.  [SvE84]  give  an  0{n)-t'ime  1-probe 
scheme  and  show  that  a  variation  of  the  FKS-scheme  is  space  optimal  at  a  cost  of  taking  0{n)  time  to  hash 
a  value.  Their  scheme  turns  out  to  be  surprisingly  similar  to  our  0(\)  time  1-probe  scheme,  which  is  the 
first  to  be  optimal  in  both  space  and  time. 

Non-oblivious  0{l)-pvohe  schemes  turn  out  to  be  a  lot  more  powerful  than  1-probe  schemes,  where  the 
question  of  obliviousness,  of  course,  has  no  bearing.  [FNSS88]  present  a  non-oblivious  scheme  which  needs 
only  0(loglog77i  +  logii)  bits,  and  [FN88]  have  recently  shown  that  no  additional  memory  is  required  when 
7?)  is  polynomial  in  n.  No  comparable  oblivious  schemes  are  known,  and  the  natural  question  to  resolve  is 
how  much  of  the  exponential  spatial  advantage  of  0(l)-probe  non-oblivious  schemes  over  1-probe  schemes 
is  due  to  their  adaptive  character,  and  how  much  is  due  to  the  opportunity  to  perform  additional  oblivious 
search.  The  counting  arguments  used  in  [Me84]  seem  to  provide  no  help  in  an  effort  to  construct  lower 
bounds  for  multi-probe  oblivious  hashing:  the  Q(Tj)-bit  portion  of  the  argument  collapses  even  for  3-probe 
schemes. 

We  show  that  the  spatial  complexity  of  A'-probe  oblivious  hash  functions  for  full  tables  is  lower  bounded 
by  Q(7)Q*^  -f  loglog77i),  and  that  this  bound  is  tight  with  f-i-^'ogt/*-  <  q  <  g-i+^Z*'.  In  contrast  to  1-probe 
schemes,  no  comparable  lower  bound  can  be  obtained  for  0(l)-probe  schemes  for  load  factors  less  than  1. 
Indeed,  a  probabilistic  construction  shows  that  A--probe  oblivious  hash  functions  can  be  specified  in  as  few 
CIS  0(log77  -I-  loglog77i)  bits,  whcH  the  size  ni  of  the  hash  table  is  (1  -|-  e)n,  f  >  0. 

For  completeness,  we  note  that  Mairson  [Ma83],  [Ma84]  analyzed  a  number  of  related  problems  including 
binary  search  adapted  to  a  page  oriented  hash  scheme,  where  the  cost  to  read  a  2*^-record  page  is  fixed,  and 
the  data  is  sorted  on  each  page.  He  analyzed  a  scheme  limited  to  reading  one  page,  and  supporting  k  rounds 
of  binary  search.  While  his  scheme  is  not  oblivious  since  it  uses  binary  search,  its  spatial  complexity  is 
remarkably  close  to  the  spatial  complexity  of  fully  oblivious  A--probe  schemes,  analyzed  in  this  manuscript. 


Ill  Section  2,  we  prove  that  the  spatial  complexity  of  a  A--probe  oblivious  hash  function  for  a  set  of  n 
elements  from  the  universe  U  =  [l,?n]  is  ft(f  "^h/A-" ).  Section  3  shows  that  a  random  set  of  0(2"'  logm) 
A-probe  hash  functions  contains  a  perfect  A;-probe  hash  function  for  each  n  element  subset  of  U  with  prob- 
ability (1  —  o(l)).  The  first  part  of  Section  4  exhibits  an  0(l)-time  1-probe  hash  scheme  which  uses  only 
O(log  log  77)  +  7i)  bits  of  external  memory,  and  thereby  shows  that  the  lower  bound  for  l-probe  schemes 
can  be  met  for  0(1)  time  hash  functions.  The  second  part  of  Section  4  gives  several  explicit  potentially 
useful  constructions  of  oblivious  multi-probe  heish  functions.  Our  explicit  constructions  use  the  ideais  of 
universal  hashing,  in  that  we  exhibit  a  family  of  functions  having  the  property  that,  for  any  S  C  U,  only 
1/77  of  the  functions  are  not  valid  multi-probe  functions.  We  also  extend  the  lower  bounds  from  [Me82]  to  a 
wider  class  of  families  of  universal  hash  functions,  show  that  the  bounds  are  formally  tight,  and  give  explicit 
constructions  tliat  almost  match  the  lower  bounds. 

2.   Lower  bounds  for  oblivious  search 

Mehlhorns  Q{n)  bound  for  the  size  of  a  perfect  hash  function,  [Me82],  is  based  on  the  following  counting 
argument:  the  number  of  functions  must  be  at  least  the  number  of  7i-item  subsets,  which  belong  to  an 
777-element  universe  U,  divided  by  the  maximum  number  of  subsets  which  can  be  mapped  one-to-one  into 
[0.7)  —  1]  by  a  single  hash  function:  couni  >  ("')/(?t)/7))".  Taking  the  logarithm  of  this  estimate  gives  the 
size  bound  for  such  hash  functions.  Unfortunately,  this  ratio  decreases  by  a  factor  of  k"  when  i-probe  maps 
are  allowed.  Formally,  a  hash  function  defines  a  bipartite  graph  on  B  =  U  x  {0, . . . ,  n  —  1).  In  the  1-probe 
case,  each  vertex  in  U  has  degree  1,  and  the  count  of  the  number  of  perfect  matchings  afforded  by  a  single 
function  is  precisely  the  number  of  different  subsets  serviceable  by  the  hash  function.  Once  k  probes  are 
allowed,  the  degree  of  each  vertex  increases  by  a  factor  of  A-,  and  the  obvious  (km/n)"  counts  the  number  of 
possible  matchings  from  all  7!-subsets  of  U ,  which  overcounts  the  number  of  serviceable  subsets  by  as  much 
as  a  factor  of  k" . 

Consequently,  we  are  obliged  to  model  the  AT-probe  scheme  more  carefully.  We  model  a  family  of  such 
hash  functions  as  a  set  H  of  bipartite  graphs  on  B,  where  each  node  of  U  has  degree  k,  for  each  graph. 
//  contains  a  A--probe  perfect  hash  function  for  each  7j-element  subset  of  U  if  and  only  if  each  n-element 
subset  hcis  a  perfect  matching  for  some  H  ^  H .  The  lower  bound  for  describing  such  A:-probe  oblivious  hash 
functions  is  thus  log  |.^|. 

To  estimate  \H\,  we  choose  an  arbitrary  H  £  H  and  a  randomly  selected  n-element  set  V  C  U ,  and 
compute  an  upper  bound  p  for  the  probability  that  H  £  H  provides  a  perfect  matching  for  V .  It  then  follows 
that  the  number  of  n  element  subsets  of  U ,  for  which  H  is  perfect  is  at  most  {'")p-  Thus 

(1)  l^l>l/P 

and  members  of  H  cannot  be  identified  in  fewer  than  log,  4  bits. 

By  letting  V  be  selected  at  random,  we  may  imagine  an  honest  intermediary  who  conveys,  as  answers 
to  queries  by  our  (partial)  matching  algorithm,  precise  (minimal  amounts  of)  information  about  the  items 
selected. 

We  will  estimate  the  probability  that  a  carefully  selected  subset  i]  €  {^,'n]  of  n/ck  items  is  covered  by 
edges  emanating  from  V^.  (The  exact  value  of  c  will  be  specified  later). 


For  V  e  U,  we  define  Image(v)  -  {i  €  [1.  n]  |  (r, )}  is  an  edge  in  the  graph  E\, 

For  I  G  [1, 7i]  we  define  Preimage{i)  =  {v  £  U  \  {v,  i}  is  an  edge  in  the  graph  H}. 

Let  P,  =  \Preimage{i)\.  Let  the  tuple  t]  comprise  the  elements  i  €  [l,n],  sorted  in  order  of  increasing 
P,  values. 

By  construction,  the  first  items  in  i]  are  not  overly  likely  to  be  covered  by  the  edges  emanating  from  a 
randomly  selected  set  V .  The  selection  process  for  7}  proceeds  as  follows: 

0.  Let  lb  —  \\  T]o  —  {1 n}. 

1.  For  i  =  0  to  n/ck  —  1  perform  2  through  5: 

2.  Let  V,  be  an  element  with  minimum  P^,,  m  ?),. 

3.  Sequence  through  the  elements  of  Vj  and  stop  at  first  v  £  V,  which  is  in  Preimagei^Vi).  If  no  such 
V  is  found  return  ''fail'',  H  is  noi  perfect  for  V . 

4.  i;+i  -\;-v. 

5.  Set  7;,  +  i  —  7/,  -  Image(v). 

Note  that  the  items  tn  77,  +  !  do  noi  have  a  preimage  in  Vb  —  V;  and  \rii+i\  >  \r}o\  —  ik  >  n  —  n/c. 

6.    7?  <—  {j^o.  .  .J^n/ofc-l}- 

If  the  above  procedure  fails,  then  there  is  no  matching  from  V"  to  [1,  t?].  (The  converse  is  of  course  not  true, 
but  our  aim  is  to  upper  bound  the  probability  of  a  success.) 

We  have  to  estimate  the  probability  that  V  contains  an  element  in  Preiniage(i',),  for  0  <  ?'  <  n/ck, 
at  step  3.  The  query  at  step  3  is  whether  v  £  V;  is  in  Preimage{i^,).  Only  at  step  5  is  Image(v)  actually 
revealed.  Since  P^^  <  kin/n,  the  probability  that  the  j**^  element  in  Vb  is  the  first  one  to  cover  i/q,  is 

Prob{vj  £  Preimage{i/o)  \  Vi,...Vj^i  ^  Preimage{iyo)}  <  —. p ttt  < 


n{ni  —  (j  —  I))  7i(l  —  n/m) 
The  estimates  of  the  probabilities  of  success  for  i  >  0,  in  step  3  are  slightly  more  delicate.  In  particular, 
the  event:  "t''  £  Preimage(vi)  \  v\, . .  .v',-^  ^  Pr€image{i^i)" ,  has  to  be  conditioned  by  the  additional 
information  concerning  f'-,  acquired  by  previous  queries.  However,  all  we  have  learned  is  that  r'  is  not  in 
the  Preitnage  of  some  of  the  elements  j/q,  ;/i,  .  .  . ,  j/,_i,  that  f'  ^  V  —  Vi-\  and  tij  ^  {^'ii  •  ■  f ]_i}-  The 
probability  that  v',  £  Preimage{iy,)  is  maximized  if  all  the  eliminated  candidates  are  not  in  Preimage{i'i). 
Since  u^  (selected  in  step  2)  is  among  the  kt  ■{■  1  smallest  elements  in  tjq,  it  follows  that  Xl£=o  ■'^"^  —  ikm/n; 
also  V  —  V',_)  C  lJ^=o  •'^''^"^^i'^^'^')'  ^^^  therefore  in  the  most  extreme  Ccise,  the  conditional  information 
can  eliminate  at  most  (^;Zg  Pi,,  -\-  j  —  \  <  m/c  +  n)  elements  of  U  as  candidates  for  f] .  It  follows  that: 


Proh{v';  £  Preimage{i'i)  \  all  previous  events}  < 
Prob{u,  £  Image(\\)}  \    all  previous  events}  <  1  —  (  1 


I  —  m/c  —  n) 

P., 
m  —  m/c  —  ; 

<  1  -  (I  -  o(l))e"^''- ">—%-» 


And  finally 


p  =  Prob{H  is  perfect  for  1'}  <  Prob{  that  the  Image  of  1'  contains  all  the  elements  in  i)} 

n/ck-l 

<     n      (l-(l-o(l))e-^-.^:rr-7^). 


t=0 


Since  Yl'i=o  ^"i  —  '^'"V"'  ^y  construction,  the  above  product  is  maximized.  efTectively,  when  all  P^^  are  set 
to  km/n.  We  choose  c  =  (h  +  1 )??)/( m  —  7])  and  conclude  that 


p  =  Prob{H  is  perfect  for  V'}  <  (l  -  (1  -  o(l)))e-^*^+'^/''-"/'"^) 


n(\-n/m)/(k  +  l)k 


Computing  logj  4  gives  roughly  (-iLe-(*-  +  i)/(i-'>/m)^  {^j^g  necessary  to  specify  an  oblivious  search  strat- 
egy. We  have  proved 

Theorem  1. 

The  spatial  complexity  of  a  )t-probe  oblivious  hash  function  for  a  set  of  n  elements  belonging  to  the 
universe  [1,?77]  is  Qif^e'''-!^^ ),  (or  fi(pre"M  for  n  =  0(777)).        | 

The  above  lower  bound  for  |//|  is  not  an  increasing  function  of  777. 

However,  a  simple  information  theoretic  argument,  which  was  mentioned  to  us  independently  by  [FN] 
and  [F],  shows  that  n(log  77r/(iklog  n))  is  also  a  lower  bound  on  the  size  of  \H\.  The  argument  is  given  here 
for  completeness.  A  /.--probe  heish  function  H  maps  elements  of  U  into  {1,  .  .  . ,  r?}'''.  One  of  the  conditions  a 
/I'-probe  scheme  satisfies  is  that  no  Ic  +  I  items  in  5  have  all  k  probe  locations  in  common.  Thus 

,  -  ,       log(m/i-) 

and  n(loglog(m)  —  tlog(n))  bits  are  needed  to  name  H. 

Combining  the  information  theoretic  bound  with  Theorem  1  we  get 

Theorem  2. 

The  spatial  complexity  of  a  A-probe  oblivious  hash  function  for  a  set  of  n  elements  belonging  to  the 
universe  [I.777]  is  r2(log  log  777  -|-  j^e~  '"■-"  ).       | 

By  appealing  to  a  hypergraph  model  and  extending  the  proof  technique  of  Theorem  1,  we  can  incorporate 
the  method  of  Fredman  and  Komlos  [FK84]  to  show  that  \H\  is  actually  lower  bounded  by  the  product  of 
the  two  bounds,  rather  than  the  maximum  of  the  two.  This  gives  the  slightly  stronger  result 

Theorem  3. 

The  size  \H\  of  a  family  of  A--probe  hash  functions  from  a  universe  U  to  a  table  of  size  77  is:   fi((l  -|- 

"  '  login'-)    ■  ■ 

The  above  theorems  also  show  that  the  lower  bound  of  Ci{n)  for  the  spatial  complexity  holds  even  for 
relatively  small  universes  (such  as  777  ss  27i).  On  the  other  hand,  for  k  >  1,  the  lower  bound  does  not  hold  if 
we  were  to  map  77  elements  into  a  table  of  size  (1  -|-  e)n.  The  upper  bounds  in  the  second  part  of  Section  3 
show  that  this  is  not  an  artifact  of  our  proof  method. 


3.   Upper  bounds  for  oblivious  search 
3.1  Oblivious  A-probe  schemes 

Our  lower  bounds  suggest  that,  while  A'-probe  perfect  hash  schemes  must  have  a  reeisonably  large  program 
length,  in  the  case  of  oblivious  search,  the  k  —  \  additional  search  opportunities  might  reduce  the  length  by 
a  factor  of  about  e^'  in  its  n  dependence.  We  appeal  to  methods  from  the  theory  of  random  graphs  to  give  a 
nonconstructive  demonstration  that  this  reduction  is  indeed  possible,  at  least  in  a  formal  sense.  The  proof 
is  actually  a  probabilistic  construction  of  a  A--probe  oblivious  hash  function.  For  k  =  1,  such  a  probabilistic 
construction  has  been  given  in  [Me82].  This  proof  for  A-  =  1  is  quite  simple  and,  in  fact,  both  the  lower  and 
upper  bound  were  obtained  by  the  same  argument;  when  additional  probes  are  permitted,  the  lower  bound 
argument,  as  we  have  seen,  is  more  delicate,  while  the  upper  bound,  as  we  now  demonstrate,  requires  much 
more  care. 

The  procedure  is  to  imagine  constructing  a  random  bipartite  graph  G  on  U  x  [l,n],  where  each  vertex 
in  U  has  degree  k.  We  then  select  a  random  set  S  of  n  items  from  U  and  estimate  the  probability  p  that 
G  contains  a  perfect  matching  on  5.  With  positive  probability,  a  family  H  of  logi/(i_„)  C^)  such  random 
graphs  will,  for  each  n-element  subset  in  U,  contain  some  graph  that  has  a  perfect  matching  for  that  set, 
and  it  follows,  therefore  that  such  an  H  exists.  The  number  of  bits  required  to  designate  an  element  H  £  H 
is  then  loglogi/,i_p,  (■;;). 

Accordingly,  let  S  G  U ,  \S\  =  n  be  the  subset  we  wish  to  match.  Let,  for  convenience,  G  be  used 
to  name  the  portion  of  our  random  graph  that  is  restricted  to  5  x  [l,n].  A  few  preliminary  remarks  and 
definitions  will  help  simplify  the  subsequent  exposition. 

•  We  recall  Hall's  (a.k.a.  The  Marriage)  Theorem:  A  matching  exists  if  and  only  if  for  all  j,  every  set  of 
j  vertices  in  S  has  at  least  j  vertices  in  [1,  n]  that  are  connected  to  the  set.  Let  Hall(j)  be  the  property 
that  every  set  of  J  vertices  in  S  has  at  least  j  vertices  in  [l,n]  that  are  connected  to  the  set. 

•  Let,  for  each  s  £  S,  the  first  3  random  edges  that  are  chosen  to  emanate  from  s  be  gold.  The  remaining 
A-  —  3  edges  will  be  green. 

•  Let  A  be  the  event  that  the  Hall(j)  property  holds  for  j  <  n/A  for  the  subgraph  restricted  to  the  gold 
edges. 

•  Let  B  be  the  event  that  the  Hall(j)  property  holds  for  n/A  <  j  <  n  —  n/{k  -  3),  where  both  gold  and 
green  edges  are  used. 

•  We  will  be  running  a  matching  algorithm  which  will  find  augmenting  paths  via  breadth  first  search  on 
G,  and  will  be  selecting  the  random  edges  of  G  as  the  algorithm  progresses.  Let  the  set  A'  comprise  the 
first  n/A  vertices  of  [l,n]  that  are  matched  (by  gold  edges,  as  it  happens). 

•  Let  C  be  the  event  that  the  first  (k  —  4)n  green  edges  encountered  by  the  algorithm  cover  [1,  n]  —  X. 
We  could  imagine  an  adversary  taking  note  of  matched  vertices,  which  will  be  announced  whenever 
the  partial  matching  is  augmented.  It  will  also  be  told  of  every  green  edge  that  is  encountered  by  the 
matching  algorithm. 

We  use  a  standard  matching  algorithm  (c.f.  [HK73]):  For  each  s  6  5,  a  new  breadth  first  search  is 
initiated  to  find  an  augmenting  path  from  s.  A  bfs  level  explores  non  matching  edges  from  a  current  vertex 
set  R  C  S,  which  is  initially  s.  The  search  is  successful  for  s  successful  if  it  discovers  a  vertex  in  [l,n]  that 


has  not  been  visited  by  any  of  the  searches,  and  is  therefore  unmatched.  Otherwise,  the  currently  matched 
mates  (in  5)  of  the  [l,n]  vertices  that  are  newly  encountered  in  the  current  level  of  the  search  (i.e.,  that 
have  not  been  previously  seen  by  the  current  search)  are  used  to  replace  R  for  the  next  level  of  exploration. 
Once  a  previously  unmatched  vertex  v  G  [l,n]  is  visited,  the  alternating  property  of  being  matched  or  not, 
along  the  path  of  edges  from  s  to  the  u  is  reversed.  Then  a  new  s  is  selected  and  a  new  search  initiated. 

Our  use  of  the  algorithm  has  two  constructive  phaises,  which  are  followed  by  a  postamble.  The  first 
phase  pursues  a  partial  matching  on  the  subgraph  of  gold  edges,  until  a  set  A'  of  n/4  vertices  of  S  have  been 
matched,  or  a  subset  of  _/  <  n/4  elements  have  been  encountered  that  violates  Hall(j').  In  the  second  phase, 
the  count  is  extended,  if  possible,  to  71  via  both  gold  and  green  edges.  If  a  match  does  not  occur  by  the  time 
{k  —  4)n  green  edges  have  been  used  in  phase  2,  the  algorithm  fails.  The  postamble  can  be  entered  upon 
failure  encountered  during  phases  1  or  2,  or  upon  a  successful  matching  (completed  in  phase  2).  If  success 
or  failure  occurs  before  (k  —  4)n  green  edges  are  explored,  we  may  imagine  that  the  algorithm  continues  to 
select  and  report  new  green  edges  until  the  requisite  number  of  edges  is  used. 

The  algorithm  can  switch  to  the  fail  state  in  phase  1  only  if  A  does  not  hold.  In  phase  2,  failure  can  be 
due  to  the  fact  that  a  new  s  does  not  have  an  augmenting  path  in  a  breadth  first  search  that  fails  because  A 
or  B  does  not  hold,  or  because  the  breadth  first  search  touches  7j(l  —  7-73)  vertices,  which  fail  to  have  green 
edges  that  cover  enough  of  [1,  n]  —  A'.  In  this  later  ccise,  C  does  not  hold. 

These  observations  prove 

Leninia  4. 

Pr{matching}  >  Pr{A  nBnC}.       | 
Consequently,  Pr{matching}  >  Pr{B  nC\A}Pr{A]  >  iPr{B\A}  +  Pr{C\A}  -  l)Pr{A}. 

We  now  estimate  these  probabilities.  The  gold  edges,  for  each  vertex,  are  selected  without  replacement 
(i.e.,  each  triple  of  edges  comprises  three  distinct  members),  but  the  remaining  k  —  Z  edges  are  selected  with 
replacement  for  computational  convenience.  It  is  also  helpful  to  denote  [l,n]  —  A'  by  {j/i,  1/2,  ■•■,  2/371/4}- 
There  follows: 

^'•{-4}>i-  E  f")(-",)(^——)^  =  1-0(0. 


3<j<nH 


J  -  Ixffc-a 


Pr{5|^)>l-  Y.  f     J(  ._  J(^)<'-"'^>l-o((4/5)3"/4).  /t>6. 

n/4<><n(t-4)/(it-3)  ^•'^    ^-^ 

Let  hit(y)  be  the  event  that  y  is  hit  by  one  of  the  first  {k  —  4)n  green  edges  examined.    Note  that 
Pr{C\A]  =  Pr{C] . 


Pr{C]  =  Pr{hit(yi)}  x  Pr{hit{y2)  \  Int(yi)]  x  ■  ■  ■  Pr{hit{y2n/4}  \  hit(y,),  .  . . ,  hit{y3„/^_,)} 

>  (1  _  (!illl)"(i-5)j3n/4  >  M   _  g-i.-+5j3n/4 

~  n 

Since  1  -  Pr{B  |  .4}  =  o{Pr{C  \  A}),  for  k  >  6,  and  Pr{A}  =  1  -  o(l),  the  probability  p  of  a  matching 
is  >  (l-e-*^+^)3"/''(i_o(l)). 

As  we  have  already  observed,  a  fanuly  of  |.^|  =  logi^(i_p)  (^')  such  random  graphs  can  supply  a  hash 
function  (graph)  for  all  n-element  subsets.    Computing  logo  log; /n_x  (^')  gives,  b  fs  log2  logj  m  +  |e^~*n 
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bits  to  designate  whicli  random  graph  gives  a  ^-probe  perfect  hasli  function  for  a  given  set  S.  If  lime  and 
worl^space  are  of  no  significance,  we  can  follow  the  theme  given  for  1-probe  hashing  in  [Me82],  which  is  to  use 
the  lexicographically  smallest  collection  H  among  all  collections  of  such  2''  bipartite  graphs  which  supports 
/t-probe  hashing  for  all  n-item  subsets  of  [0,  rn  —  1].  Then  the  6-bit  number  H  names  a  graph  within  that 
collection  (with  respect  to  the  lexicographic  order  within  fl).  The  desired  datum  is  found  in  at  most  k 
probes,  as  specified  by  the  graph's  edges. 
We  have  shown 

Theorem  5. 

A  A--probe  hash  function  for  a  set  of  7?  elements  belonging  to  the  universe  [1,jtj]  into  a  table  of  size  n 
can  be  specified  in  log  log  m  +  0(ne~^)  bits,  which  is  tight  to  within  a  factor  of  k~  of  the  lower  bound,  (or 
tight  to  within  a  constant  factor  for  constant  k).       | 

Finally,  we  remark  that  a  straightforward  application  of  Hall's  Theorem  shows  that  if  we  were  to  map 
the  elements  of  S  into  a  table  of  size  2n,  then  the  probability  of  a  successful  matching  using  3  probes  is 
I  -  o{n-^).  It  follows  that 

Obsei'vatiou  C. 

A  3-probe  hash  function  for  a  set  of  n  elements  belonging  to  the  universe  [l,'7i]  into  a  table  of  size  2n 
can  be  specified  in  0(loglog7Ti  +  logn)  bits.       | 

4.   Explicit  constructions  and  i-elated  bounds 
4.1  An  optimal  0(l)-time  1-probe  scheme 

The  upper  bound  in  the  previous  section  uses  a  probabilistic  construction  to  establish  the  spatial  com- 
plexity of  A'-probe  oblivious  hash  functions.  Unfortunately,  such  a  technique  does  not  yield  any  explicit 
0(l)-time  hash  function. 

The  hashing  technique  given  in  [FKS84]  uses  a  table  of  linear  congruential  functions  of  the  form  {ax  mod 
p)  mod  q.  The  beisic  idea,  which  we  detail  below,  is  to  use  such  a  hash  function  to  define  an  implicit  partition 
of  the  data  set  into  a  favorably  distributed  collection  of  collision  buckets,  and  to  look  up,  given  a  bucket 
index,  an  explicit  (locally  defined)  secondary  hcish  function  which  determines  a  unique  address  (within  the 
bucket)  for  item  sought. 

Altogether,  the  address  evaluation  uses  a  few  logTi-bit  arithmetic  computations,  several  array  accesses, 
and  one  long  word  computation  of  the  form  {kx  mod  p)  mod  n-,  where  k,p  €  U  and  p  <  77- log  m.  It  is 
natural  to  eisk  if  the  [FKS84]  scheme  or  some  other  reeisonable  hash  scheme  can  be  can  be  expressed  in  an 
optimal  0(loglog77)  -I-  77)  bits,  while  maintaining  0(1)  time  search,  as  conjectured  in  [SvE84]. 

We  show  that  an  [FKS84]-based  scheme  can  be  expressed  in  an  optimal  number  of  bits  (up  to  a  multi- 
plicative factor),  and  performed  with  essentially  the  same  selection  of  0(1)  arithmetic  operations  and  array 
accesses. 

We  use  the  usual  machine  model  associated  with  this  problem,  which  is  somewhat  idealized.  The 
model  is  bcisically  that  of  a  random  access  machine.  In  particular,  an  array  access  of  an  0(logr?)-bit  word 
takes  unit  time,  and  index  computations  are  permitted.    \\"ords  can  be  added,  subtracted,  multiplied  and 
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(integer)  divided  in  constant  time.  We  also  use  the  same  single  long  word  computation  as  in  [FKS84]  and 
[Me82]  to  map  elements  from  U  into  [0.  n'-].  Similarly,  we  employ  the  expositional  expediency  of  calling  this 
computation  an  0(1)  time  operation. 

The  original  FKS  construction  has  four  basic  steps. 

1.  A  (long  word)  function  ho{x]  is  found  that  maps  S  into  [O.n-  -  1]  without  collisions,  so  that  all 
subsequent  computation  can  use  normal  length  words.  ha(x)  =  {kx  mod  p)  mod  n- ,  k,p  <  n-  lognj  and 
p  prime.  (Corollary  2  and  Lemma  2  in  [FKS84].)  Let  E  be  the  image  of  /)„  o  S. 

2.  Next,  a  function  hgix)  is  found  that  maps  32  into  [0,n  -  1]  so  that  the  sum  of  the  squares  of  the 
collision  sizes  is  not  too  large.  hij{x)  =  [kx  mod  p)  mod  n,  p  >  rj-  is  prime  and  k  £  [0,p]  so  that 
i:o<;<n  ll^'s'U)  n  -I'  <  3".  (Corollary  3  in  [FKS84].) 

3.  For  each  hash  bucket  (i.e.,  integer  having  at  least  one  appearance  in  the  multiset  {/)/3(0}!€s)  a  secondary 
hash  function  hi  is  found  that  is  one-to-one  on  the  collision  set.  Let  C;  be  the  size  of  the  collision  set  for 
bucket  i:  a  -  \h-^{i)  n  S|.  Then  for  x  €  /ij ^(0  n  S,  hi{x)  =  (kiX  mod  p)  mod  cj ,  where  k,  G  [0,  p-  1]. 
The  item  s  e  S,  (which  is  represented  by  t  =  ha{s)  and  located  in  hash  bucket  i  =  h0(t))  is  stored  in 
location  C,  -f  h,{ha(s)),  where  d  =  cl  +  c'f  +  . . .  +  cf_■^.  This  locates  all  n  items  within  a  size  3n  table. 

4.  Finally,  the  table  is  stored  without  vacant  locations  in  a  size  n  array  .4[l..r)],  i.e.  the  array  contains  the 
items  of  S  in  the  order  of  appearance  in  the  table  Zn. 

The  composite  hash  function  requires  ha,  hp,  a.  table  /v[0..n]  storing  the  parameters  ki  for  the  secondary 
functions  hi,  a  table  C[0..n]  listing  the  d,  and  a  compression  table  D[1..3n],  where  D[j]  gives  the  index,  for 
A.  of  the  item  (if  any)  that  hashes  to  the  value  j  by  the  function  outlined  in  1  through  3.  As  presented,  the 
description  of  this  perfect  hash  function  requires  2  loglogm -h  0(77  logn)  bits. 

In  choosing  the  secondary  hash  functions  /j,  for  each  bucket,  we  appeal  to  [FKS84].  They  point  out 
that  since  their  existence  proofs  are  based  on  expected  case  analysis,  at  least  one  half  of  the  numbers  in 
[0,p—  1]  will  yield  appropriate  functions  /i,  if  the  hash  range  is  doubled.  In  particular,  their  Corollary  4 
shows  that  given  a  collision  set  of  any  c,  items  in  E,  at  least  half  of  the  numbers  in  [0,p—  1]  will,  if  selected 
as  multiplier  k,.  yield  a  function  hi{x)  —  {k,x  mod  p)  mod  2cf  that  is  one-to-one  on  the  set.  Consequently, 
if  we  have  z  collision  buckets  requiring  multipliers,  we  may  select  a  single  multiplier  that  is  one-to-one  for 
[r/2]  of  the  buckets,  provided  we  double  the  size  of  the  hash  range  in  each  case.  Altogether,  we  need  at  most 
[lognj-l-l  different  ki  values,  where  ki  is  the  i""  multiplier  servicing  about  1/2'  of  the  buckets.  Moreover,  the 
information  content  of  the  map  from  bucket  indices  into  these  [log  nj  -f  1  multipliers  (i.e.,  the  representation 
of  the  table  A'  with  Huffman  coding)  is  0(ri). 

In  summary,  the  optimal  perfect  hash  function  is  that  described  above,  with  two  modifications: 

1.  The  (uncompressed)  hash  value  is  /i,(i-)  =  ((kiX  mod  p)  mod  2c?)  +  2C,-. 

2.  The  multiplier  values  k  are  iteratively  selected  to  satisfy  the  one-to-one  requirement  of  the  maximum 
number  of  (unsatisfied)  hi's. 

It  remains  to  show  how  to  achieve  a  compact  encoding  of  the  content  of  the  tables  A',  C,  and  D  to  require 
only  0{n)  bits,  that  is  readily  decodable  in  0(1)  time  in  our  computational  model.  The  basic  decoding 
operations  we  will  use  are  as  follows: 

1.  Extract  a  subsequence  of  bits  from  one  word. 

2.  Concatenate  two  bit  strings  of  altogether  O(logn)  bits. 
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3.  Locate  the  bit-index  of  the  ^--th  zero  in  a  word. 

4.  Count  the  number  of  consecutive  1-s  in  an  0(log7!)-bit  word,  starting  at  the  beginning. 

5.  Count  the  number  of  zeros  in  an  0(log77)-bit  word. 

6.  Access  a  few  constants. 

The  first  two  operations  are  a  matter  of  arithmetic.  The  last  four  can  be  performed  with  small  look-up 
tables.  In  particular,  it  suffices  to  break  the  words  into  words  of  -^f-^  bits  (padded  with  leading  zeros), 
and  to  use  these  words  to  access  decoder  arrays.  The  third  operation,  for  example,  requires  for  each  k, 
0  <  k  <  logn/2,  an  array  of  y/n  words.  A  word  of  21ogn  bits  requires  up  to  four  accesses  to  compute  the 
location  of  a  ^-lli  0.  The  fourth  and  fifth  operation  is  accomplished  in  a  similar  matter. 

The  table  C.  which  contains  the  values  cj ,  is  encoded  as  follows.  First,  the  squares  of  the  various  lengths 
cj  are  stored  in  a  table  To  in  unary  notation  (in  order  of  appearance,  separated  by  0"s).  (Note  that  the  bit 
length  of  To  is  at  most  4??).  We  let  «/;■,  denote  the  string  that  encodes  cj.  Because  of  operations  1  and 
2,  we  may  suppose  that  each  bit  of  To  is  addressable.  Let  a,  be  the  address  (index)  of  the  starting  point 
of  si;'jiogii  in  To,  so  that  0  <  a,   <  4;j,  for  ;'  =  0, 1, . . . ,  7i/ logn.    A  second  table  Ti  contains,  in  binary, 

the  indices  ao,ai an/iogn-    If  "^j+i  —  <ii   <  21ogTj,  the  table  information  for  intermediate  c^   (i.e.    for 

/log??  <  j  <  {i  +  l)logr))  can  be  readily  decoded  via  0(1)  accesses  to  Tq  and  Ti  and  0(1)  of  our  decoding 
operations. 

The  case  Qi+i  —  a,  >  (log/?)-  is  handled  via  a  third  table  Tn,  which  stores,  starting  in  (bit)  location  Of, 
the  log  n  —  1  binary  indices  for  the  starting  locations  of  sirj,  i\ogn  <  j  <  {i  +  l)logn. 

The  remaining  case,  21og7?  <  a,  +  i  -  a,  <  (log?))-  is  handled  with  an  additional  level  of  refinement. 
Let  bij  be  the  address  (index  in  To)  of  the  starting  point  of  s^r,  logn-i-j  logiogn,  (j  <  logn/loglogn  and 
a-i  <  b,j  <  a,  +  i).  In  Tj,  we  store,  starting  in  location  a,,  the  binary  offsets  6,^i  —  ai,bi2  —  a,-,....  Each 
offset  is  stored  as  a  2  [log  log  n] -bit  binary  number.  If  6,  j  +  j  -  b,^  <  21ogn  the  information  for  intermediate 
C(.  ;log;j  -I-  j  log  log  n  <  /  <  i\ogn  +  jloglog??,  is  readily  decoded;  for  all  other  cases,  the  oflTsets  (of  size 
log  log  n)  of  all  intermediate  cJ*  are  encoded  through  one  last  level  of  indirection  in  a  table  T3,  starting  at 
location  fc;  j. 

The  table  A'  can  be  encoded  in  exactly  the  same  way.  In  particular,  a  table  A'o  contains,  for  its  i-th 
sequence  of  bits,  the  integer  o,  in  unary,  i{  ka,  is  the  multiplier  assigned  to  hash  bucket  i,  0  <  i  <  n.  (Recall 
that  the  first  multiplier  (encoded  by  the  string  "1")  is  usable  for  at  least  half  of  the  hash  buckets.)  The  total 
length  of  the  sequence  comprises  at  most  n  O's,  n/2  singleton  I's  n/4,  doubleton  I's,  etc.,  for  a  total  length 
of  3n.  The  multipliers  ki A'logn  are  stored  in  a  log n-word  array. 

It  is  evident  that  we  have  encoded  a  perfect  hash  function  in  0(?i)  bits.  To  summarize,  the  above 
construction,  combined  with  the  lower  bound  of  [Me84],  gives 

Theorem  7. 

The  spatial  complexity  of  a  0(l)-tiine  perfect  (1-probe)  hash  function  for  a  set  of  71  elements  belonging 
to  the  universe  [l,m]  is  0(loglog77)  +  7?),       | 
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4.2  Related  constructions  and  corresponding  bounds 

We  have  analyzed  tlie  complexity  of  hcish  function  families  with  the  property  that  they  contain,  for  any 
77-element  S  C  U,  at  least  one  "good"  hash  function.  Closely  related  are  universal  hash  function  families 
[C\\'79],  which  have  the  property  that,  for  any  such  subset,  most  functions  behave  "adequately"  well.  It 
turns  out  that  the  existence  of  a  good  function  in  the  first  model  is  frequently  proven  via  an  averaging 
argument,  which  can  usually  be  modified  to  yield  a  family  of  universal  hash  functions.  Not  surprising,  the 
same  type  of  functions  are  generally  used  in  both  models. 

The  appropriate  definitions  in  [CW79]  and  [Me82]  can  be  combined  to  define  a  family  H  of  c-universalj. 
hash  functions  for  a  universe  U  as  follows. 

Let  if  be  a  collection  of  functions:  U  i —  [l."]-  ^Ve  call  H  c-universaU-  if^^': 

c 

for  j  <  k,  xi  <  X2  <  ■■■  <  Xj  €  U  ■    ProbheH{h(xi)  =  hixn)  =    •  •  =  h(xj )]  <  — -— r. 

One  can  also  define  a  family  of  universal  hash  functions  with  respect  to  any  property  Q  and  probability  p: 
We  say  that  H  is  (p;  (5)-universal  if  for  any  suitably  restricted  5  C  U, 

Probh^n{h  restricted  to  S  has  property  Q)  >  p. 

We  first  derive  upper  and  lower  bounds  for  c-universalj.  hash  functions  and  then  show  how  to  construct 
universal  families  of  oblivious  logn-probe  hash  functions. 

A  useful  lemma,  for  establishing  universal  properties  was  established  in  [CW79]  and  extended  in  [WC79]: 

Lemma  [CW79]. 

Let  p  >  77)  be  a  prime.  Let  H  =  {h  \  h  is  a  polynomial  of  degree  A-  —  1  over  the  field  Zp].  Let  F  =  {f  \ 
f  —  h  mod  11,  li  €  //}.  The  family  F  of  functions:  U  i —  [1,  n]  is  (1  +  o(l))-universaU-.       | 

Another  useful  lemma  was  established  in  [KU86],  a  variation  of  which  we  reproduce  here: 

Lemma  [KU86]. 

Let  pni  be  a  fixed  prime  greater  than  m. 
Let  H^  =  {h  \  h{x)  =  g{x)  mod  (n/c),  g  polynomial  of  degree  ^  over  the  field  Zj,^  }. 
Then  for  j  >  max(e.  c),  S  G  U,  \S\  =  n  :     Probh^n^  {maxi<.<„/e  \h-\i)  n  5|  >  j)  <  "( jff  )^-       I 
4.2.1  C-universalt  hashing 

Mehlhorn  {[Me82])  showed  that  c-universaK  hash  functions  have  a  spatial  complexity  of  Q(logn  + 
log  log  771  —  log  c)  bits,  and  gave  an  explicit  construction  of  a  family  of  8-universal2  hash  functions  of  matching 
spatial  complexity.  The  following  slight  variation  of  his  proof  shows  that  a  family  of  c-universalt  hash 
functions  must  contain  at  leeist  ^(  °^'  „  ~  1)  functions,  and  the  spatial  complexity  of  a  function  in  the 
set  is  therefore  ^{k\ogn  +  log  log  m  —  logc)  bits. 


All  random  selection  will  be  with  respect  to  the  uniform  distribution  on  the  set  in  question. 
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Let  H  =  {III,  .  .  ■li\H\}  be  a  c-universaU-  family  of  hash  functions.  Since  the  range  of  h,  is  {1,  .  .  .,n}, 
there  is  a  set  Si  C  U,  \Si\  >  m/n.  where  hi  is  constant  when  restricted  to  Si.  By  induction,  there  are 
sets  Sj  C  Sj-i  C  ■•■  C  U-  \Sj\  >  m/nK  and  hi .  Ii-j. .  .  .  Itj  are  constant  when  restricted  to  Sj .  Let  jo  be 
the  maximal  j  for  which  \Sj\  >  k.  Let  xi,...X):  G  Sj^.  Since  H  is  c-universalj.,  and  all  i,  collide  under 
/n, ..../),„,  we  must  have  \H\  >  '-^jo-  Smce  jo  >  ^^Si^  -  1,  \H\  >  ii^(!2£i^  -  1),  and  hence 
log\H\  >  (1  -  o{l)){k  -  1  logn  -  log  c+  log  log  m). 

On  the  other  hand,  one  can  easily  show  that  a  randomly  selected  set  of  i-r/'Mog  C^')  hash  functions, 
(for  c  >  3),  is  c-universaU-  for  (/. 

Remark  S:       The  spatial  complexity  for  c-universaU-  hash  functions,  (for  c  >  3),  is  (1  ±  o(l))(loglog  rn  + 
k  log  n  —  log  c ) . 

It  is  also  not  difTicuh  to  construct  an  explicit  family  of  (1  +  o{  l))-universaU-  hash  functions  that  is  quite 
small,  albeit  not  optimal.  W'e  first  appeal  to  the  following  variation  of  Lemma  2  from  [FKS84]: 

Remark  9:       Let  3v^  —  {p  \  P  prime  and  p  6  ("■'  log  m,  (2  +  c)n^  log  m)} .  Then 

Wt  ^  y  £  U  :    Probp£3i^  {^  =  y  niod  p}  <  n~^ 

Proof:     By  the  prime  number  theorem  we  have 

y      log  p  =  (I  +  ()n^  log  r?)  +  o(ti^  log  m). 

Let  Pi  y  =  {p  I  p  prime  and  divides  x  —  y}.  Then 

^    log  p  <  log  m, 

and  it  follows  that  \Pr,y  n^j\/\dlj\  <  77"^       | 

Let  Hi  -  {h  I  h(x)  -  (ax  +  6  mod  p)  mod  n^',p€  SRt,  and  a  ^  0,  6  €  {0, 1,  2, . .  .p  -  1}}. 

Let  pk  be  a  fixed  prime  greater  than  ji*^. 

Let  flo  =  {h  I  /;  is  a  polynomial  of  degree  k  —  I  with  coefficients  in  the  field  Zp^]. 

Put  F  =  {f  \  f(x)  =  (/)2  o  hi(x)  mod  pk),  /jo  G  //s,  />!  E  Hi}.   It  follows  that  for  j  <  k,  xi  <  X2  <  ■  ■  ■  < 

xj  e  u. 

Probj^r{f(xi)  =  fix.)  =  ■■■  =  fixj)]  <  fn''  +  {l+o(l))n-^  +  \ 

and  F  is  (1  +  o(l  ))-universaU.  for  (/  with  spatial  complexity  k(k  +  3)log7)  +  31oglog;7i. 
4.2.2  Hash  functions  with  large  px-obe  numbers 

In  this  subsection,  we  give  a  construction  for  i'-probe  hash  functions  of  interest.  Our  probabilistic 
upper  bounds  from  Section  3  show  that  there  is  a  logn-probe  oblivious  search  strategy  for  full  tables,  and 
a  3-probe  oblivious  search  strategy  for  50%  full  tables,  which  each  use  only  0(loglogm  +  logn)  additional 
space.  We  describe  an  explicit  construction  of  a  class  of  uniform  oblivious  logn  probe  hashing  schemes, 
which  map  elements  from  a  set  S  of  size  n  into  a  table  of  size  (1  +  f)n,  (e  >  0),  and  which  can  be  described 
in  0(log  log  ni  +  log^  n)  bits. 
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Construction  10. 

We  construct  a  family  H  —  {<  /, /i ,  ■  . .  ,/i_i  >],k  —  O(logH),  of  functions  from  U  into  [1, . .  .(1  +(:)n]. 
H   will  have  tiie  property  that  for  any  n-element  set  S  C   f.  a  randomly  chosen  h   E   H  will,  with  high 
probability,  be  a  valid  A'-probe  hash  function  mapping  S  into  [1, . .  .(1  +  f )n]. 
Let  /?i  be  the  set  of  all  primes  in  (n^  log  771,  (2  +  f  )'7^  log  777). 

Let  Gi  =  {/7  I  h(x)  -  (ai  +  h     (mod  p))     (mod  n^j.pg  R^,  and  a  7^  0,  6  G  {0, 1, . .  .p  -  1)}. 
Let  P3  be  a  fixed  prime  greater  than  rfi ,  and  set  co  =  2  +  4/e. 
Let  G'-,  =  {h  I  h  is  a  polynomial  of  degree  co  log 77  over  the  field  Zy^]. 
Let  Go  =  {/(  I  /i  =  /?'  mod  (771^),/''  €  G'^},  where  cj  =  2c2/£. 
Finally,  put  k  =  ((1  +  f)ci)logj7,  and  let 

ij  -  {<  /,/i  .  .  .,A_i  >|  /  =  /72  o  /7i,(/7i,/7,)  €  Gi  X  G.  and  /,(x)  =  f{x)  +  jn/{c,  log  77)}. 

The  family  77  of  A'-probe  functions  is  universal  in  the  following  sense: 

Probk^H{l>  is  a  valid  i--probe  function  for  5}  >  1  —  27i~^ 

Proof:     The  family  Gi  is  universal  with  respect  to  the  property  Q,  and  probability  (l  —  77"^): 
Q  zz   {Vj-.y  G   S     h{x)  9^   l>{y)],  with  probability  1  —  I/77.    That  is,  for  a  randomly  chosen  hi    6  d: 
Prob{3x,  y  £  S  s.t.  Ii(i)  =  fi{y)}  <  \/n  (This  follows  from  Remark  9.) 
The  family  Gn  is  universal  with  respect  to  the  property  Qi,  and  probability  (1  —  n~^): 
Qi  =  {Vz  e  {!,...,  n/{ci  \ogn)}\g~^{i)\  <  (H-e)ci  logTi).  That  is,  for  a  randomly  chosen  hn  €  G2,  restricted 
to  S:  Prob{Qi]  >  (1  -  77"').  (This  follows  from  Lemma[KU].) 

Let  f  —  h'2  o  hi  be  a  randomly  chosen  function  in  //.  By  construction,  hi  is  collision  free  on  S  with 
probability  1  —  l/n  and  the  collison  chains  created  by  hn  are  bounded  by  (1  +  i)ci  log  77  with  probability 
1  —  1/77.  The  i--probe  function  <  /,  /i  . . . ,  fk-\  >,  as  defined  above,  is  therefore  a  valid  0(logn)-probe  hash 
function  for  5  with  probability  1  —  (2/n).       | 

The  bit-complexity  of  the  A;-probe  function  is  the  same  as  for  the  function  /,  which  is  0(loglogm  +  log"  77). 

Conclusions  and  open  pi-oblenis 

We  have  shown  how  to  construct  explicit  space-time  optimal  perfect  hash  functions,  and  given  some  related 
universal  hashing  constructions.  We  have  given  tight  bounds  on  the  spatial  complexity  of  oblivious  k- 
probe  hash  functions,  and  helped  quantify  the  difference  between  oblivious  and  non-oblivious  strategies.  In 
particular,  we  have  shown  that,  for  full  tables,  0(1)  addtional  oblivious  probes  can  reduce  the  requisite  size 
of  the  search  program  by  a  constant  factor,  but  no  more.  As  is  well  known,  the  decreeise  in  program  size,  for 
partially  full  search  tables,  is  more  dramatic:  probabilistic  arguments  show  the  formal  existance  of  oblivious 
0(1)  probe  schemes  that  need  only  0(log  log  77?  -f-  log  n)  bits.  It  would  be  interesting  to  find  an  explicit 
construction  of  a  small  0(l)-probe  oblivious  family  of  hash  functions  that  map  n  elements  into  a  table  of 
size  (1  -t-  f  )77. 
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